integral operator
Appendix 446 A Proof of Proposition 1 in Section 2 447 Proof
ReLU (T (v u) + b) = ReLU( Tv + b), where u = 0, that is, ReLU (T + b) is not injective. By injectivity of T, we finally get a = b . Remark 2. An example that satisfies (3.1) is the neural operator whose This construction is given by the combination of "Pairs of projections" discussed in Kato [2013, Section I.4.6] with the idea presented in [Puthawala et al., 2022b, Lemma 29]. R. We write operator null G by Thus, in both cases, H is injective. Remark 4. W e make the following observations using Theorem 1: Leaky ReLU is one of example that satisfies (ii) in Theorem 1. Puthawala et al. [2022a, Theorem 15] assumes that We first revisit layerwise injectivity and bijectivity in the case of the finite rank approximation.
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > India > Tripura (0.04)
- North America > United States > South Dakota (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate {\em an} integral operator which is determined by the feature vector distribution $\rho$ only. Consequently, GD method can be viewed as {\em approximately} applying the powers of this integral operator on the underlying/target function $f^*$ that generates the responses/labels. We show that if $f^*$ admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low rank approximation error at a linear rate which is determined by $f^*$ and $\rho$ only, i.e., the rate is independent of the sample size $n$. Furthermore, if $f^*$ has zero low-rank approximation error, then, as long as the width of the neural network is $\Omega(n\log n)$, the empirical risk decreases to $\Theta(1/\sqrt{n})$. To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where $\rho$ is the uniform distribution on the spheres and $f^*$ is a polynomial. Throughout this paper, we consider the scenario where the input dimension $d$ is fixed.
Summary
We would like to thank the entire review team for their efforts and insightful comments. DZPS18] ([DZPS18] refers to arXiv:1810.02054) approach zero (i.e., ImageNet dataset has 14 million images. For those applications, a non-diminishing convergence rate is more desirable. Response to the concern on fixed second layer . Specifically, the same assumption is made in [ADH+19] and [ZCZG18] (arXiv:1811.08888),
SVD-NO: Learning PDE Solution Operators with SVD Integral Kernels
Koren, Noam, Mackenbach, Ralf J. J., van Sloun, Ruud J. G., Radinsky, Kira, Freedman, Daniel
Neural operators have emerged as a promising paradigm for learning solution operators of partial differential equa- tions (PDEs) directly from data. Existing methods, such as those based on Fourier or graph techniques, make strong as- sumptions about the structure of the kernel integral opera- tor, assumptions which may limit expressivity. We present SVD-NO, a neural operator that explicitly parameterizes the kernel by its singular-value decomposition (SVD) and then carries out the integral directly in the low-rank basis. Two lightweight networks learn the left and right singular func- tions, a diagonal parameter matrix learns the singular values, and a Gram-matrix regularizer enforces orthonormality. As SVD-NO approximates the full kernel, it obtains a high de- gree of expressivity. Furthermore, due to its low-rank struc- ture the computational complexity of applying the operator remains reasonable, leading to a practical system. In exten- sive evaluations on five diverse benchmark equations, SVD- NO achieves a new state of the art. In particular, SVD-NO provides greater performance gains on PDEs whose solutions are highly spatially variable. The code of this work is publicly available at https://github.com/2noamk/SVDNO.git.
- Asia > India > Tripura (0.05)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (3 more...)
Infinite Neural Operators: Gaussian processes on functions
de Souza, Daniel Augusto, Zhu, Yuchen, Cunningham, Harry Jake, Saporito, Yuri, Mesquita, Diego, Deisenroth, Marc Peter
A variety of infinitely wide neural architectures (e.g., dense NNs, CNNs, and transformers) induce Gaussian process (GP) priors over their outputs. These relationships provide both an accurate characterization of the prior predictive distribution and enable the use of GP machinery to improve the uncertainty quantification of deep neural networks. In this work, we extend this connection to neural operators (NOs), a class of models designed to learn mappings between function spaces. Specifically, we show conditions for when arbitrary-depth NOs with Gaussian-distributed convolution kernels converge to function-valued GPs. Based on this result, we show how to compute the covariance functions of these NO-GPs for two NO parametrizations, including the popular Fourier neural operator (FNO). With this, we compute the posteriors of these GPs in regression scenarios, including PDE solution operators. This work is an important step towards uncovering the inductive biases of current FNO architectures and opens a path to incorporate novel inductive biases for use in kernel-based operator learning methods.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > California (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)